{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# 向量存储(Vector Stores)\n",
    "## 理解向量存储\n",
    "将文本向量化之后，下一步就是进行向量的存储。这部分包含两块：\n",
    "- **向量的存储** ：将非结构化数据向量化后，完成存储\n",
    "- **向量的查询** ：查询时，嵌入非结构化查询并检索与嵌入查询“**最相似**”的嵌入向量。即具有 **相似性检索能力**\n",
    "\n",
    "典型的介绍如下：\n",
    "| 向量数据库 | 描述 |\n",
    "|--|--|\n",
    "| Chroma | 开源、免费的嵌入式数据库 |\n",
    "| FAISS | Meta出品，开源、免费，Facebook AI相似性搜索服务。（Facebook AI Similarity Search，Facebook AI 相似性搜索库） /fæs/ |\n",
    "| Milvus | 用于存储、索引和管理由深度神经网络和其他ML模型产生的大量嵌入向量的数据库 |\n",
    "| Pinecone | 具有广泛功能的向量数据库 |\n",
    "| Redis | 基于Redis的检索器  |"
   ],
   "id": "c45ff580bdb6512e"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "## 文档嵌入模型",
   "id": "4a918c78db6633f4"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-24T09:47:49.774305Z",
     "start_time": "2025-11-24T09:47:48.445992Z"
    }
   },
   "cell_type": "code",
   "source": [
    "import os\n",
    "\n",
    "import dotenv\n",
    "import langchain_openai\n",
    "\n",
    "dotenv.load_dotenv(override=True)\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = os.getenv(\"OPENAI_API_KEY\")\n",
    "os.environ[\"OPENAI_BASE_URL\"] = os.getenv(\"OPENAI_BASE_URL\")\n",
    "\n",
    "EMBEDDING_MODEL = langchain_openai.OpenAIEmbeddings(model=\"text-embedding-ada-002\")"
   ],
   "id": "8f525f1d3120e86a",
   "outputs": [],
   "execution_count": 1
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## 数据的存储\n",
    "> 举例1：从TXT文档中加载数据，向量化后存储到Chroma数据库"
   ],
   "id": "f4ce094d7ee9dae4"
  },
  {
   "cell_type": "code",
   "id": "initial_id",
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2025-11-24T11:20:04.886502Z",
     "start_time": "2025-11-24T11:20:03.496176Z"
    }
   },
   "source": [
    "from langchain_text_splitters import CharacterTextSplitter\n",
    "from langchain_community.vectorstores import Chroma\n",
    "from langchain_community.document_loaders import TextLoader\n",
    "\n",
    "# 加载文档\n",
    "print(\">>>>>>>>>>>>>开始加载文档...\")\n",
    "text_loader = TextLoader(\"asset/load/09-ai1.txt\", encoding=\"utf-8\")\n",
    "docs = text_loader.load()\n",
    "print(len(docs))\n",
    "\n",
    "# 拆分文档\n",
    "print(\">>>>>>>>>>>>>开始拆分文档...\")\n",
    "text_splitter = CharacterTextSplitter(\n",
    "    chunk_size=1000,\n",
    "    chunk_overlap=100\n",
    ")\n",
    "chunks = text_splitter.split_documents(docs)\n",
    "print(len(chunks))\n",
    "\n",
    "# 文档向量化\n",
    "print(\">>>>>>>>>>>>>开始向量化并存储文档...\")\n",
    "db = Chroma.from_documents(chunks, EMBEDDING_MODEL)\n",
    "\n",
    "# 查询\n",
    "print(\">>>>>>>>>>>>>开始查询...\")\n",
    "query = \"人工智能的核心技术都有啥？\"\n",
    "\n",
    "result = db.similarity_search(query)\n",
    "print(len(result))\n",
    "print(result[0].page_content)"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ">>>>>>>>>>>>>开始加载文档...\n",
      "1\n",
      ">>>>>>>>>>>>>开始拆分文档...\n",
      "3\n",
      ">>>>>>>>>>>>>开始向量化并存储文档...\n",
      ">>>>>>>>>>>>>开始查询...\n",
      "4\n",
      "3. 人工智能的核心技术\n",
      "3.1 机器学习\n",
      "机器学习是人工智能的核心技术之一，通过算法使计算机从数据中学习并做出决策。常见的机器学习算法包括监督学习、无监督学习和强化学习。监督学习通过标记数据进行训练，无监督学习则从未标记数据中寻找模式，强化学习则通过与环境交互来优化决策。\n",
      "3.2 深度学习\n",
      "深度学习是机器学习的一个子领域，通过多层神经网络进行特征提取和模式识别。深度学习在图像识别、自然语言处理、语音识别等领域取得了显著成果。常见的深度学习模型包括卷积神经网络（CNN）、循环神经网络（RNN）和长短期记忆网络（LSTM）。\n",
      "3.3 自然语言处理\n",
      "自然语言处理（NLP）是人工智能的一个重要分支，致力于使计算机能够理解和生成人类语言。NLP技术广泛应用于机器翻译、情感分析、文本分类等领域。近年来，基于深度学习的NLP模型（如BERT、GPT）在语言理解任务中取得了突破性进展。\n",
      "3.4 计算机视觉\n",
      "计算机视觉是人工智能的另一个重要分支，致力于使计算机能够理解和处理图像和视频。计算机视觉技术广泛应用于图像识别、目标检测、人脸识别等领域。深度学习模型（如CNN）在计算机视觉任务中取得了显著成果。\n",
      "\n",
      "4. 人工智能的应用领域\n",
      "4.1 医疗健康\n",
      "人工智能在医疗健康领域的应用包括疾病诊断、药物研发、个性化医疗等。通过分析医学影像和患者数据，人工智能可以帮助医生更准确地诊断疾病，提高治疗效果。\n",
      "4.2 金融\n",
      "人工智能在金融领域的应用包括风险评估、欺诈检测、算法交易等。通过分析市场数据和交易记录，人工智能可以帮助金融机构做出更明智的决策，提高运营效率。\n",
      "4.3 教育\n",
      "人工智能在教育领域的应用包括个性化学习、智能辅导、自动评分等。通过分析学生的学习数据，人工智能可以为学生提供个性化的学习建议，提高学习效果。\n",
      "4.4 交通\n",
      "人工智能在交通领域的应用包括自动驾驶、交通管理、智能导航等。通过分析交通数据和路况信息，人工智能可以帮助优化交通流量，提高交通安全。\n"
     ]
    }
   ],
   "execution_count": 11
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "> 举例2：操作csv文档，并向量化",
   "id": "9b1912c68e93accb"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-24T11:19:43.862680Z",
     "start_time": "2025-11-24T11:19:42.933103Z"
    }
   },
   "cell_type": "code",
   "source": [
    "from langchain_community.document_loaders import CSVLoader\n",
    "\n",
    "print(\">>>>>>>>>>>>>加载并第一次拆分csv文档...\")\n",
    "csv_loader = CSVLoader(file_path='asset/load/03-load.csv')\n",
    "docs = csv_loader.load_and_split()\n",
    "print(len(docs))\n",
    "\n",
    "print(\">>>>>>>>>>>>>第二次拆分文档...\")\n",
    "text_splitter = CharacterTextSplitter.from_tiktoken_encoder(\n",
    "    chunk_size=500\n",
    ")\n",
    "chunks = text_splitter.split_documents(docs)\n",
    "print(len(chunks))\n",
    "\n",
    "print(\">>>>>>>>>>>>>开始向量化并存储文档...\")\n",
    "db_path = \"./chroma_db\"\n",
    "db = Chroma.from_documents(chunks, EMBEDDING_MODEL, persist_directory=db_path)  # 持久化到指定目录"
   ],
   "id": "924fb5ce687288fa",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ">>>>>>>>>>>>>加载并第一次拆分csv文档...\n",
      "4\n",
      ">>>>>>>>>>>>>第二次拆分文档...\n",
      "4\n",
      ">>>>>>>>>>>>>开始向量化并存储文档...\n"
     ]
    }
   ],
   "execution_count": 10
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## 数据的检索\n",
    "举例：一个包含构建Chroma向量数据库以及向量检索的代码"
   ],
   "id": "8d21b73b4772bb0d"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "### 前置代码：",
   "id": "dee4414f850ac6bb"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-24T11:23:23.267058Z",
     "start_time": "2025-11-24T11:23:21.686687Z"
    }
   },
   "cell_type": "code",
   "source": [
    "# 1.导入相关依赖\n",
    "from langchain_chroma import Chroma\n",
    "from langchain_core.documents import Document\n",
    "\n",
    "# 2.定义文档\n",
    "raw_documents = [\n",
    "    Document(\n",
    "        page_content=\"葡萄是一种常见的水果，属于葡萄科葡萄属植物。它的果实呈圆形或椭圆形，颜色有绿色、紫色、红色等多种。葡萄富含维生素C和抗氧化物质，可以直接食用或酿造成葡萄酒。\",\n",
    "        metadata={\"source\": \"水果\", \"type\": \"植物\"}\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"白菜是十字花科蔬菜，原产于中国北方。它的叶片层层包裹形成紧密的球状，口感清脆微甜。白菜富含膳食纤维和维生素K，常用于制作泡菜、炒菜或煮汤。\",\n",
    "        metadata={\"source\": \"蔬菜\", \"type\": \"植物\"}\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"狗是人类最早驯化的动物之一，属于犬科。它们具有高度社会性，能理解人类情绪，常被用作宠物、导盲犬或警犬。不同品种的狗在体型、毛色和性格上有很大差异。\",\n",
    "        metadata={\"source\": \"动物\", \"type\": \"哺乳动物\"}\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"猫是小型肉食性哺乳动物，性格独立但也能与人类建立亲密关系。它们夜视能力极强，擅长捕猎老鼠。家猫的品种包括波斯猫、暹罗猫等，毛色和花纹多样。\",\n",
    "        metadata={\"source\": \"动物\", \"type\": \"哺乳动物\"}\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"人类是地球上最具智慧的生物，属于灵长目人科。现代人类（智人）拥有高度发达的大脑，创造了语言、工具和文明。人类的平均寿命约70-80年，分布在全球各地。\",\n",
    "        metadata={\"source\": \"生物\", \"type\": \"灵长类\"}\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"太阳是太阳系的中心恒星，直径约139万公里，主要由氢和氦组成。它通过核聚变反应产生能量，为地球提供光和热。太阳活动周期约为11年，会影响地球气候。\",\n",
    "        metadata={\"source\": \"天文\", \"type\": \"恒星\"}\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"长城是中国古代的军事防御工程，总长度超过2万公里。它始建于春秋战国时期，秦朝连接各段，明朝大规模重修。长城是世界文化遗产和人类建筑奇迹。\",\n",
    "        metadata={\"source\": \"历史\", \"type\": \"建筑\"}\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"量子力学是研究微观粒子运动规律的物理学分支。它提出了波粒二象性、测不准原理等概念，彻底改变了人类对物质世界的认知。量子计算机正是基于这一理论发展而来。\",\n",
    "        metadata={\"source\": \"物理\", \"type\": \"科学\"}\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"《红楼梦》是中国古典文学四大名著之一，作者曹雪芹。小说以贾、史、王、薛四大家族的兴衰为背景，描绘了贾宝玉与林黛玉的爱情悲剧，反映了封建社会的种种矛盾。\",\n",
    "        metadata={\"source\": \"文学\", \"type\": \"小说\"}\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"新冠病毒（SARS-CoV-2）是一种可引起呼吸道疾病的冠状病毒。它通过飞沫传播，主要症状包括发热、咳嗽、乏力。疫苗和戴口罩是有效的预防措施。\",\n",
    "        metadata={\"source\": \"医学\", \"type\": \"病毒\"}\n",
    "    )\n",
    "]\n",
    "# 3.创建向量数据库\n",
    "db = Chroma.from_documents(\n",
    "    documents=raw_documents,\n",
    "    embedding=EMBEDDING_MODEL,\n",
    "    persist_directory=\"./asset/chroma-3\",\n",
    ")"
   ],
   "id": "3077e04730883c72",
   "outputs": [],
   "execution_count": 12
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "### 支持直接对问题向量查询（similarity_search_by_vector）",
   "id": "b65c937eec177022"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-24T11:28:28.025616Z",
     "start_time": "2025-11-24T11:28:27.282664Z"
    }
   },
   "cell_type": "code",
   "source": [
    "query = \"哺乳动物\"\n",
    "embedding_vector = EMBEDDING_MODEL.embed_query(query)\n",
    "result = db.similarity_search_by_vector(embedding_vector, k=3)\n",
    "print(len(result))\n",
    "for r in result:\n",
    "    print(f\">>>>>>>>>>>>>\\nr.metadata: {r.metadata}\")\n",
    "    print(f\"r.page_content: {r.page_content}\")"
   ],
   "id": "42921d360e4cd0bf",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n",
      ">>>>>>>>>>>>>\n",
      "r.metadata: {'source': '动物', 'type': '哺乳动物'}\n",
      "r.page_content: 猫是小型肉食性哺乳动物，性格独立但也能与人类建立亲密关系。它们夜视能力极强，擅长捕猎老鼠。家猫的品种包括波斯猫、暹罗猫等，毛色和花纹多样。\n",
      ">>>>>>>>>>>>>\n",
      "r.metadata: {'type': '哺乳动物', 'source': '动物'}\n",
      "r.page_content: 狗是人类最早驯化的动物之一，属于犬科。它们具有高度社会性，能理解人类情绪，常被用作宠物、导盲犬或警犬。不同品种的狗在体型、毛色和性格上有很大差异。\n",
      ">>>>>>>>>>>>>\n",
      "r.metadata: {'source': '生物', 'type': '灵长类'}\n",
      "r.page_content: 人类是地球上最具智慧的生物，属于灵长目人科。现代人类（智人）拥有高度发达的大脑，创造了语言、工具和文明。人类的平均寿命约70-80年，分布在全球各地。\n"
     ]
    }
   ],
   "execution_count": 14
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "### 相似性检索，支持过滤元数据（filter）",
   "id": "a66be43de9f8041b"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-24T11:30:31.764536Z",
     "start_time": "2025-11-24T11:30:30.826283Z"
    }
   },
   "cell_type": "code",
   "source": [
    "query = \"哺乳动物\"\n",
    "embedding_vector = EMBEDDING_MODEL.embed_query(query)\n",
    "result = db.similarity_search_by_vector(embedding_vector, k=3, filter={\"type\": \"哺乳动物\"})\n",
    "print(len(result))\n",
    "for r in result:\n",
    "    print(f\">>>>>>>>>>>>>\\nr.metadata: {r.metadata}\")\n",
    "    print(f\"r.page_content: {r.page_content}\")"
   ],
   "id": "8791932cf7a42400",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n",
      ">>>>>>>>>>>>>\n",
      "r.metadata: {'type': '哺乳动物', 'source': '动物'}\n",
      "r.page_content: 猫是小型肉食性哺乳动物，性格独立但也能与人类建立亲密关系。它们夜视能力极强，擅长捕猎老鼠。家猫的品种包括波斯猫、暹罗猫等，毛色和花纹多样。\n",
      ">>>>>>>>>>>>>\n",
      "r.metadata: {'type': '哺乳动物', 'source': '动物'}\n",
      "r.page_content: 狗是人类最早驯化的动物之一，属于犬科。它们具有高度社会性，能理解人类情绪，常被用作宠物、导盲犬或警犬。不同品种的狗在体型、毛色和性格上有很大差异。\n"
     ]
    }
   ],
   "execution_count": 15
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "### 通过L2距离分数进行搜索（similarity_search_with_score）",
   "id": "3f762fde53c61755"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-24T11:33:34.342139Z",
     "start_time": "2025-11-24T11:33:32.145609Z"
    }
   },
   "cell_type": "code",
   "source": [
    "query = \"量子力学是什么？\"\n",
    "result = db.similarity_search_with_score(query)\n",
    "for doc, score in result:\n",
    "    print(f\">>>>>>>>>>>>>\\nscore: {score}\")\n",
    "    print(f\"doc.metadata: {doc.metadata}\")\n",
    "    print(f\"doc.page_content: {doc.page_content}\")"
   ],
   "id": "ab0c1d350a51a319",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ">>>>>>>>>>>>>\n",
      "score: 0.18318983912467957\n",
      "doc.metadata: {'source': '物理', 'type': '科学'}\n",
      "doc.page_content: 量子力学是研究微观粒子运动规律的物理学分支。它提出了波粒二象性、测不准原理等概念，彻底改变了人类对物质世界的认知。量子计算机正是基于这一理论发展而来。\n",
      ">>>>>>>>>>>>>\n",
      "score: 0.45052671432495117\n",
      "doc.metadata: {'source': '天文', 'type': '恒星'}\n",
      "doc.page_content: 太阳是太阳系的中心恒星，直径约139万公里，主要由氢和氦组成。它通过核聚变反应产生能量，为地球提供光和热。太阳活动周期约为11年，会影响地球气候。\n",
      ">>>>>>>>>>>>>\n",
      "score: 0.462968647480011\n",
      "doc.metadata: {'source': '生物', 'type': '灵长类'}\n",
      "doc.page_content: 人类是地球上最具智慧的生物，属于灵长目人科。现代人类（智人）拥有高度发达的大脑，创造了语言、工具和文明。人类的平均寿命约70-80年，分布在全球各地。\n",
      ">>>>>>>>>>>>>\n",
      "score: 0.48633313179016113\n",
      "doc.metadata: {'source': '医学', 'type': '病毒'}\n",
      "doc.page_content: 新冠病毒（SARS-CoV-2）是一种可引起呼吸道疾病的冠状病毒。它通过飞沫传播，主要症状包括发热、咳嗽、乏力。疫苗和戴口罩是有效的预防措施。\n"
     ]
    }
   ],
   "execution_count": 16
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "### 通过余弦相似度分数进行搜索（_similarity_search_with_relevance_scores）",
   "id": "a643f278eacef81"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-24T11:35:25.775277Z",
     "start_time": "2025-11-24T11:35:24.063846Z"
    }
   },
   "cell_type": "code",
   "source": [
    "docs = db._similarity_search_with_relevance_scores(\n",
    "    \"量子力学是什么?\"\n",
    ")\n",
    "for doc, score in docs:\n",
    "    print(f\">>>>>>>>>>>>>\\nscore: {score}\")\n",
    "    print(f\"doc.metadata: {doc.metadata}\")\n",
    "    print(f\"doc.page_content: {doc.page_content}\")"
   ],
   "id": "e761554f5b7c9302",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ">>>>>>>>>>>>>\n",
      "score: 0.8712575621890469\n",
      "doc.metadata: {'type': '科学', 'source': '物理'}\n",
      "doc.page_content: 量子力学是研究微观粒子运动规律的物理学分支。它提出了波粒二象性、测不准原理等概念，彻底改变了人类对物质世界的认知。量子计算机正是基于这一理论发展而来。\n",
      ">>>>>>>>>>>>>\n",
      "score: 0.6840939235908701\n",
      "doc.metadata: {'type': '恒星', 'source': '天文'}\n",
      "doc.page_content: 太阳是太阳系的中心恒星，直径约139万公里，主要由氢和氦组成。它通过核聚变反应产生能量，为地球提供光和热。太阳活动周期约为11年，会影响地球气候。\n",
      ">>>>>>>>>>>>>\n",
      "score: 0.6724540177033739\n",
      "doc.metadata: {'source': '生物', 'type': '灵长类'}\n",
      "doc.page_content: 人类是地球上最具智慧的生物，属于灵长目人科。现代人类（智人）拥有高度发达的大脑，创造了语言、工具和文明。人类的平均寿命约70-80年，分布在全球各地。\n",
      ">>>>>>>>>>>>>\n",
      "score: 0.6546721148026579\n",
      "doc.metadata: {'source': '医学', 'type': '病毒'}\n",
      "doc.page_content: 新冠病毒（SARS-CoV-2）是一种可引起呼吸道疾病的冠状病毒。它通过飞沫传播，主要症状包括发热、咳嗽、乏力。疫苗和戴口罩是有效的预防措施。\n"
     ]
    }
   ],
   "execution_count": 18
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### MMR（最大边际相关性，max_marginal_relevance_search）\n",
    "`MMR` 是一种平衡 相关性 和 多样性 的检索策略，避免返回高度相似的冗余结果。"
   ],
   "id": "7a7bb8536821d5ed"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-24T11:36:44.101700Z",
     "start_time": "2025-11-24T11:36:43.277678Z"
    }
   },
   "cell_type": "code",
   "source": [
    "docs = db.max_marginal_relevance_search(\n",
    "    query=\"量子力学是什么\",\n",
    "    lambda_mult=0.8,  # 侧重相似性\n",
    ")\n",
    "print(\"🔍 关于【量子力学是什么】的搜索结果：\")\n",
    "print(\"=\" * 50)\n",
    "for i, doc in enumerate(docs):\n",
    "    print(f\"\\n📖 结果 {i + 1}:\")\n",
    "    print(f\"📌 内容: {doc.page_content}\")\n",
    "    print(f\"🏷 标签: {', '.join(f'{k}={v}' for k, v in doc.metadata.items())}\")"
   ],
   "id": "ed92e3f488f6094c",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "🔍 关于【量子力学是什么】的搜索结果：\n",
      "==================================================\n",
      "\n",
      "📖 结果 1:\n",
      "📌 内容: 量子力学是研究微观粒子运动规律的物理学分支。它提出了波粒二象性、测不准原理等概念，彻底改变了人类对物质世界的认知。量子计算机正是基于这一理论发展而来。\n",
      "🏷 标签: type=科学, source=物理\n",
      "\n",
      "📖 结果 2:\n",
      "📌 内容: 太阳是太阳系的中心恒星，直径约139万公里，主要由氢和氦组成。它通过核聚变反应产生能量，为地球提供光和热。太阳活动周期约为11年，会影响地球气候。\n",
      "🏷 标签: source=天文, type=恒星\n",
      "\n",
      "📖 结果 3:\n",
      "📌 内容: 人类是地球上最具智慧的生物，属于灵长目人科。现代人类（智人）拥有高度发达的大脑，创造了语言、工具和文明。人类的平均寿命约70-80年，分布在全球各地。\n",
      "🏷 标签: source=生物, type=灵长类\n",
      "\n",
      "📖 结果 4:\n",
      "📌 内容: 新冠病毒（SARS-CoV-2）是一种可引起呼吸道疾病的冠状病毒。它通过飞沫传播，主要症状包括发热、咳嗽、乏力。疫苗和戴口罩是有效的预防措施。\n",
      "🏷 标签: type=病毒, source=医学\n"
     ]
    }
   ],
   "execution_count": 20
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
